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Abstract 



The knowledge of channel statistics can be very helpful in making sound opportunistic spectrum 

access decisions. It is therefore desirable to be able to efficiently and accurately estimate channel 

statistics. In this paper we study the problem of optimally placing sensing times over a time window so 

as to get the best estimate on the parameters of an on-off renewal channel. We are particularly interested 

in a sparse sensing regime with a small number of samples relative to the time window size. Using 

Fisher information as a measure, we analytically derive the best and worst sensing sequences under a 

sparsity condition. We also present a way to derive the best/worst sequences without this condition using 

a dynamic programming approach. In both cases the worst turns out to be the uniform sensing sequence, 

''^ ■ where sensing times are evenly spaced within the window. With these results we argue that without a 

CN ' priori knowledge, a robust sensing strategy should be a randomized strategy. We then compare different 

^C* [ random schemes using a family of distributions generated by the circular /? ensemble, and propose an 

^D . adaptive sensing scheme to effectively track time-varying channel parameters. We further discuss the 

appUcability of compressive sensing for this problem, 
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I. Introduction 

Recent advances in software defined radio and cognitive radio UJ have given wireless devices 
greater ability and opportunity to dynamically access spectrum, thereby potentially significantly 
improving spectrum efficiency and user performance |l2l, [l3]|. To be able to fully utilize spectrum 
availability (either as a secondary user seeking opportunities of idle periods in the presence of 
primary users, or as one of many peer users in a multi-user system seeking channels with the 
best condition), a key enabling ingredient in dynamic spectrum access is high quality channel 
sensing that allows the user to obtain accurate real-time information on the condition of wireless 
channels. 

Spectrum sensing is often studied in two contexts: at the physical layer and at the MAC layer. 
Physical layer spectrum sensing typically focuses on the detection of instantaneous primary user 
signals. Several detection methods, such as matched filter detection, energy detection and feature 
detection, have been proposed for cognitive radios [31. MAC layer spectrum sensing [[5l, ||6l is 
more of a resource allocation issue, where we are concerned with the scheduling problem of 
when to sense the channel and the estimation problem of extracting statistical properties of the 
random variation in the channel, assuming that when we decide to sense the physical layer can 
provide sufficiently accurate results on instantaneous channel availability. Such channel statistics 
can be very helpful in making good channel access decisions, and most studies on opportunistic 
spectrum access assume such knowledge. 

In this paper we focus on the scheduling of channel sensing and study the effect different 
scheduling algorithms have on the accuracy of the resulting estimate we obtain on channel 
parameters. In particular, we are interested in the sparse sensing/sampling regime where we can 
use only a limited number of measurements over a given period of time. The goal is to decide 
how these limited number of measurements should be scheduled so as to minimize the estimation 
error within the maximum likelihood (ML) estimator framework. Throughout the paper the terms 
sensing and sampling will be used interchangeably. 

MAC layer channel estimation within the context of cognitive radios has been studied in 
recent years. Below we review those most relevant to the present paper. Kim and Shin [[3 
introduced a ML estimator for renewal channels using a uniform sampling/sensing scheme where 
samples of the channel are taken at regular time intervals. A more accurate, but also much 



more computationally costly Bayesian estimator was introduced in [8], again based on uniform 
sensing. [9] analyzed the relationship between estimation accuracy, number of samples taken 
and the channel state transition probabilities by using the sampling and estimation framework 
of [|5|| and focusing on Markovian channels. [[TOll proposed a Hidden Markov Model (HMM) 
based channel status predictor using reinforcement learning techniques. This predictor predicts 
next channel state based on past information obtained through uniformly sampling the channel. 
[[TT| presented a channel estimation technique based on wavelet transform followed by filtering. 
This method relies on dense sampling of the channel. 

In most of the above cited work, the focus is on the estimation problem given (sufficiently 
dense) uniform sampling of the channel, i.e., with equal time periods between successive samples. 
This scheme will be referred to as uniform sensing in the remainder of this paper. By contrast, 
sampling schemes where time intervals between successive samples are drawn from a certain 
probability distribution will be referred to as random sensing throughout the paper. We observe 
that due to constraints on time, energy, memory and other resources, a user may wish to perform 
channel sensing at much lower frequencies while still hoping for good estimates. This could be 
relevant for instance in cases where a user wants to track the channel condition in between active 
data communication, or where a user needs to track a large number of different channels. It is 
this sparse sampling scenario that we will focus on in this study, and the goal is to judiciously 
schedule these limited number of samples. 

Our main contributions are summarized as follows. 

• We demonstrate that when sampling is done sparsely, random sensing significantly outper- 
forms uniform sensing. 

• In the special case of exponentially distributed on/off durations, we derive tight lower and 
upper bounds on the Fisher information under a sparsity condition, while obtaining the best 
and worst possible sampling schemes measured by the Fisher information. We show that 
uniform sensing is the worst one can do; any deviation from it improves the estimation 
accuracy. 

• We present a dynamic programming approach to obtain the best and worst sampling se- 
quences in the more general case without the sparsity condition. 

• We show that under the same channel statistics and the same average sampling interval (or 
frequency), a random sensing scheme affects the estimation accuracy through the higher- 



order central moments of the sampling intervals, and use the circular (3 ensemble to study 

a family of distributions. 
• We present an adaptive random sensing scheme that can very effectively track time-varying 

channel parameters, and is shown to outperform its counterpart using uniform sensing. 
The remainder of this paper is organized as follows: Section|Il]presents the channel models and 
Section Un] gives the detail of the ML estimator. Then in Section |IV] we present how the sampling 
scheme affects the estimation performance; the best and worst sensing sequences with and 
without a sparse sampling condition are obtained. In Section |V] we use a family of distributions 
generated by the circular /3 ensemble to examine different random sampling schemes. Section 
|VI] presents an adaptive random sensing scheme, and Section IVIIl discusses the applicability of 
compressive sensing in this problem. Section IVIIII concludes the paper. 

II. The Channel Model 

In this paper we will limit our attention to MAC layer spectrum sensing as mentioned in the 

introduction. Within this context, the channel state perceived by a secondary user is represented 

by a binary random variable. This is a model commonly used in a large volume of literature, 

from channel estimation (e.g., Q, flU) to opportunistic spectrum access (e.g., [6J) to spectrum 

measurement (e.g., Ill2l)- Specifically, let Z{t) denote the state of the channel at time t, such 

that 

Z{t) = 1 if the channel is sensed busy at time t , 

Z(t) = otherwise . 

The advantage of such a model is its simplicity and tractability in many instances. The 
weakness lies in the fact that the actual energy present or detected in the channel is hardly 
binary. The raw channel measurement data will have to go through a binary hypothesis test 
(e.g., via thresholding) to be reduced to the above form, a process that comes with probabilities 
of error. Consequently, the channel is sensed to be in either state with a detection probability 
and a false alarm probability. 

In this paper our focus is on extracting and estimating essential statistics given a sequence of 
measured channel states (Os and 1 s) rather than the binary detection of channel state (deciding 
between and 1 given the energy reading). For this purpose, we will assume that the channel 
state measurements are error-free. If we have side information on what the detection and false 



alarm probabilities are, then the estimation results may be adjusted accordingly to utilize such 
knowledge. 

The channel state process Z(t) is assumed to be a continuous-time alternating renewal process, 
alternating between on/busy (state "1") and off/idle (state "0"), an illustration is given in Figure 
[B Typically, it is assumed that a secondary user can utilize the channel only when it is sensed 
to be in the off states (i.e., when the channel is idle or the primary user is absent). When the 
channel state transitions to the on state, the secondary user is required to vacate the channel so 
as not to interfere with the primary user (also referred to as the spectrum underlay paradigm, 
see e.g., [13]). 

This random process is completely defined by two probability density functions /i(t) and 
/o(t), t > 0, i.e., the probability distribution of the sojourn times of the on periods (denoted by 
the random variable Ti) and the off periods (denoted by the random variable Tq), respectively. 
The channel utilization u is defined as 

PIT 1 

which is also the average fraction of time the channel is occupied or busy. By the definition 
of a renewal process, Ti and To are independent and all on (off) periods are independently and 
identically distributed. It's worth pointing out that the widely used Gilbert-Elliot model (a two- 
state Markov chain) is a special case of the alternating renewal process where the on (off) periods 
are exponentially (in the case of continuous time) or geometrically (in the case of discrete time) 
distributed. 

Ti T. 

■* >■•* >■ 

ON 



OFF. 



Fig. 1. Channel model: alternating renewal process with on and off states 

III. Maximum Likelihood (ML) Based Channel Estimation 

We proceed to describe the maximum likelihood (ML) estimator [14J we will use to estimate 
channel parameters from a sequence of channel state observations. 



Recall that the channel state is assumed to follow an alternating renewal process. Such a 
process is completely characterized by the set of conditional probabilities Py (At), i,j E {0, 1}, 
At > 0, defined as the probability that given i was observed At time units ago, j is now 
observed. This quantity is also commonly known as the semi-Markov kernel of an alternating 
renewal process [|15ll . Assuming the process is in equilibrium, standard results from renewal 
theory [[TSl suggest the following Laplace transforms of the above transition probabilities: 

p*. ^_1_ {l-/i*(^)}{l-/o(^)} 

-'00 1'' J ~ „ 



Poiis) 



s E[To]s^l-f^is)f^is)} 
{l-f*(s)}{l-fSis)} 



E[To]sHl-ms)fS{s)} ' 

p..^ {i-/r(^)}{i-/o(^)} 



Puis) 



E[Ti]sHl-ms)f^is)} ' 
1 {1 -/*(.)} {1-/J(.)} 



s E[T,]s^i-ms)ms)} ' 

where /i (s) and /o(s) are the Laplace transforms of /i(t) and /o(t), respectively. We see that 
these are completely defined by the probability density functions /i(t) and /o(t). The above 
set of equations are very useful in recovering the time-domain expressions of the semi-Markov 
kernel (often times this is the only viable method). For example, in the special case where the 
channel has exponentially distributed on/off periods, we have 

f,(t) = Oie-'^' 
/o(t) = Ooe-'"' . 
Their corresponding Laplace transforms and expectations are 

f*{s) = e,/is + e,) I E[T,] = i/e, 

f^{s) = eo/{s + eo) , [ E[To] = i/eo. 

Substituting the above expressions into Q followed by an inverse Laplace transform we get the 
state transition probability as follows: 

Pij{At) = u^{l - u)'-^ + {-iy+'u'-\l - uye-^'^'+''^^' , (4) 

where u = „,J^}f^},„ . , as defined earlier. 

In this paper we consider the following estimation problem. Assume that the on/off periods 
are given by certain known distribution functions /o(t) and /i(t) but with unknown parame- 
ters. Suppose we obtain m samples {zi, Z2,--- , Zm}, taken at sampling times {ti, t2, ■ ■ ■ , t^}, 
respectively. We wish to use these samples to estimate the unknown parameters. 
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First note that the channel utilization factor u can be estimated through the sample mean of 

the m measurements as follows 

u = —S^ Zi . (5) 

m ^-^ 

i=l 

Let 9 be the unknown parameters of the on/off distributions: 9 = {^i, ^o}- Note that in general 
9i and 9o are vectors themselves. Then the likelihood function is given by 

L(9) = Pr{Z;9} 

= Pr{Zt^ = Zm, Zt^_^ = Zm-l, Zt^_2 = Zm-2, ■ ■ ■ I'Z'tx = ^\\9\ . (6) 

The idea of ML estimation is to find the value of 9 that maximizes the log likelihood function 
InL(^), i.e., the estimate 9 is such that ° -^ -* U = 0. This method has been used extensively 
in the literature lfT6l - ll20l . For a fixed set of data and underlying probability model, the ML 
estimator selects the parameter value that makes the data "most likely" among all possible 
choices. Under certain (fairly weak) regularity conditions the ML estimator is asymptotically 
optimal Bm. 

The question we wish to investigate is what impact the selection of the sampling time sequence 
{ti, ^2, ■ ■ ■ 5 ^m} has on the performance of this estimator, given a limited number of samples m. 
Specifically, we question whether random sampling is a better way of sensing the channel than 
uniform sampling where the measurement samples are taken at regular time intervals. 

For the remainder of our analysis we will limit our attention to the case where the channel 
on/off durations are given by exponential distributions. This is for both mathematical tractability 
and simplicity of presentation. We explore other distributions in our numerical experiments. 

Since the exponential distribution is defined by a single parameter, we have now 9 = {9i, 9o}, 
where 9i and 9q are the two unknown scalar parameters of the on and off exponential distributions, 
respectively. Using the memory less property, the likelihood function becomes 

L(9) = Pr{Z;9} 

m 

= Pr{Zt, = z,;9} ■ H^^i^*. = ^^\Zu-^ = z^-l■,9} 

j=2 
m 

= Pr{Zt, = z.-'d} ■\[P,^_,,X^U-'9) . (7) 

i=2 



where Atj = U — tj_i. The first quantity on the right is taken to be 

Pr{zt,=z,;e}=u'^{l-uy-'^ . (8) 

That is, the probability of finding the channel in a particular state (LHS of Eqn ([8])) is taken to 
be the stationary distribution given by the RHS. This choice is justified by assuming that the 
channel is in equilibrium. 

The second quantity P^.^^. (At^; 9) is given in Eqn dH). Combining these two quantities, we 
have 

L{eo,ei)=L{e) 

m 
i=2 

(9) 
The estimates for the parameters are found by solving 

dei ^ ■ 

Technically, to get the estimates for both Oq and 9i one needs to solve the above two equations 
simultaneously. This however proves to be computationally complex and analytically intractable. 
Instead, we adopt the following estimation procedure. We first estimate u using Eqn ([5]), and 
take 6*1 = ^ ^^' " . Due to the exponential assumption, it can be shown that this estimate of 
u is unbiased regardless of the sequence {ti, ■ ■ ■ ,tm} as long as it is determined offline. The 
likelihood function ^ can then be re-written as 

m 
i=2 

The estimation of 9q is then derived by solving the equation ^^^ ^' = 0. 

In our analysis, we will use this procedure by treating m as a known constant and solely focus 
on the estimation of 6^0, with the understanding that u is separately and unbiasedly estimated, 
and once we have the estimate for 6*0 we have the estimate for 9i. It has to be noted that this 
procedure is in general not equivalent to solving (flOl) simultaneously. However, we have found 
this to be a very good approximation, computationally feasible, and much more amenable to 
analysis. 



IV. Best and worst sampling sequences 

The goal of this study is to judiciously schedule a very limited number of sampling times so 
that the estimation accuracy is least affected. We first argue intuitively why the commonly used 
uniform sampling does not perform well when the number of samples allowed is limited. This 
motivates us to look for better sampling schemes. We then present a precise analysis through 
the use of Fisher information, in the case of exponential on/off distributions. In particular, we 
will show that using this measure, under a certain sparsity condition, uniform sensing is the 
worst schedule in terms of its estimation accuracy. We also derive an upper bound on the Fisher 
information as well as the sampling sequence achieving this upper bound. These provide us 
with useful benchmarks to assess any arbitrary sampling sequence. We then present a dynamic 
programming approach to finding the best and worst sampling sequence without the sparsity 
condition, which provides a further bound on how well any sampling sequence can be expected 
to perform. 

A. An intuitive explanation 

Uniform sensing, where samples are taken at constant time intervals, is a natural, easy-to- 
implement, and easy-to-analyze scheme. Specifically, with the on/off durations being exponential 
the likelihood function has a particularly simple form; there is also a closed-form solution to 
the maximization of the log likelihood function, see e.g., flU. However, when sensing is done 
sparsely, certain problems arise. One of the first things to note is that since there is no variation 
across sampling intervals under uniform sensing, the uniform interval in general needs to be 
upper-bounded in order to catch potential channel state changes that occur over small interval^. 
This bound cannot be guaranteed under sparse sensing. If sensing is done randomly, then even if 
the average sampling interval is large, there can be significant probability for sufficiently small 
sampling intervals to exist in any realization of the sampling time sequence {ti,t2, ■ ■ ■ ,^m}- 

We show in Figure [2] a comparison between uniform sensing and random sensing where the 
sensing times are randomly placed using a uniform distribution □ within a window of 5000 time 



'One such upper bound was proposed in (5) 
^Here uniform distribution refers to the sai 
distribution, not to be confused with uniform sensing where sampling intervals are a constant 



^Here uniform distribution refers to the sampling times being randomly placed within the window following a uniform 
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units. The on/off periods are exponentially distributed with parameters -E[To] = 2, E[Ti] = 1 
time units, respectively. The figure shows the estimated value of E[Tq] as a function of the 
number of samples taken within the window of 5000. We see that random sensing outperforms 
uniform sensing, and significantly so when m is small. 



E[TJ=2 E[T,]=1 




200 300 400 500 600 700 800 900 1000 

Sample number (T=5000) 



Fig. 2. Estimation accuracy: uniform sensing vs. random sensing 



The key to the increased accuracy is not so much that we used randomly generated sensing 
times as is the fact that a randomly generated sequence contains significantly more variability in 
its sampling intervals. In this sense a sequence does not have to be randomly generated; as long 
as it contains sufficient variability, estimation accuracy can be improved. Random generation is 
an easy and more systematic way of obtaining such a sequence. 

To see why this variability is important when sampling is sparse, consider the transition prob- 
abilities Pij(At), i,j E {0, 1}. As shown in the previous section, these probabilities completely 
define the likelihood function. They approach the stationary probabilities as At increases. For 
instance, we have Poi(At) — 7> ^r^ i+mti = ^ as At — 7> oo, and so on. This stationary quantity 
represents the average fraction of time the channel is busy, which contains little direct information 
on the average length of a busy period, the parameter we are trying to estimate. Depending on 
the mixing time of the underlying renewal process, this convergence can occur rather quickly. 
What this means is that if sampling is sparsely done, then these transition probabilities will 
become constant-like (i.e., approaching the stationary value). Loosely speaking, this means that 
the samples are of a similar quality, each providing little additional information. This also in 
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turn causes the likelihood function to be constant-like, making it difficult for the ML estimator 
to produce accurate estimates [[T4| . Literestingly, in a similar spirit but for a different problem, 
1171 studied an information retrieval problem where sensors are queried for data and they may 
be active or inactive. It was shown that if the active sensors are sparse, then randomly accessing 
them outperforms periodic (or uniform) schedules. 

B. Fisher information and preliminaries 

We now analyze this notion of information content more formally via a measure known as the 
Fisher information [|22l . For the likelihood function given in Eqn (fTTI) . the Fisher information 
is defined as: 

m) - -E[ g^, ] . (12) 

The Fisher information is a measure of the amount of information an observable random variable 
conveys about an unknown parameter. This measure of information is particularly useful when 
comparing two observation methods of random processes (see e.g., [|23l ). The precision to which 
we can estimate ^o is fundamentally limited by the Fisher information of the likelihood function. 
Due to the product form of the likelihood function, we have 



I{9o) = -E[^ —^ J 



i=2 



where ai = u^'(l — m)^"^^ and /3j = (—1)2^+2^-1^1-2^-1(1 _ uY'-'^. Define: 

so that the Fisher information can be simply written as I{9o) = YlT=29^^^i)- "^^^ function g() 
will be referred to as the Fisher function in our discussion. Note that g{) is a function of both 
Atj and Oq. However, we will suppress 6^ from the argument and write it simply as g{At). 
This is because our analysis focuses on how this function behaves as we select different At 
(the sampling interval) while holding ^0 constant. Note that the first term in Eqn (fTT)) does not 
appear in the above expression. This is because this first term is only a function of u (see Eqn 
dH])), which is separately estimated using Eqn (|5]) and not viewed as a function of ^o- Therefore 
the term disappears after the differentiation. 
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The expectation on the right-hand side of (fT3] ) can be calculated by considering all four 
possibilities for the pair (Zi_i, Zi), i.e., (0, 0), (0, 1), (1, 0), and (1, 1). Using Eqn dH), we obtain 
the transition probability of each case to be (1 — u)PQQ(At), (1 — u)PQi(At), uPio(At) and 
uPii{At), respectively. We can therefore calculate the Fisher function as follows: 

n(Af] = p-ft'oAt/u r V i I V i 

y^ ^ ^2 lu- ue-^0^*/'' (1 - n) - (1 - n)e-''o^*/" 

(1 - m) + we-'^o^*/" u+{l-u)e-'^oAt/u\- ^ > 

Below we show that under a certain sparsity condition on the sampling rate, the Fisher function 
is strictly convex, and that the Fisher information is minimized when uniform sampling is used. 
We begin by introducing this sparsity condition. 

Condition 1: (Sparsity condition) Let a = max{2 + \/2,\n{^^^),\n{j^)}. This condition 
requires that At > au/6Q. 

Taking At to be the time between two consecutive sampling points, the above condition states 
that these two points cannot be too close together with respect to the average off duration (l/6'o) 
and the channel utilization u. 

Lemma 1: The Fisher function g{At) given in Eqn (fT5l) is strictly convex under Condition [T] 
(i.e, for At > au/Oo). 

The proof of this lemma can be found in the Appendix. Using this lemma we next derive 
tight lower and upper bounds of the Fisher information. 

C. A tight lower bound on the Fisher information 

Lemma 2: For any n G N, n > 1, T G M, T > (n + l)«M/6'o, and au/Oa < At < T-nau/6o, 
the function G{At) = ng (^~^^ ) +g{At) has a minimum of (n + l)(7(^^) attained at At = ^^. 
Proof: Setting the first derivative of G to zero and solving for At results in solving the 
equation g {At) = g' { ^^^^ ). Since the arguments on both side satisfy Condition 1, by the 
assumption of the lemma, g is strictly convex according to Lemma 1 and g is a strictly monotonic 
function. Therefore there exists a unique solution within the range of (au/Oo, T — nau/Oo) to 
this equation at At = -^—^. 

Next we calculate the second derivative of G at this point. Since G (At) = g (At) + 
y\^^)^ we have G"(^) = (1 + ^)/(;Jt)- Since T>{n+ l)au/do, g is convex at this 
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Stationary point by Lemma 1. Hence G is convex at this point and it is thus a global minimum 
within the range {au/9o,T — nau/9o); the minimum value is (n + l)g{:^), completing the 
proof. ■ 

Theorem 1: Consider a period of time [0, T], in which we wish to schedule m > 3 sampling 
points, including one at time and one at time T. Denote the sequence of time spacings between 
these samples as At = [At2, Ats, ■ ■ ■ , Atm], where J2^2 ^^* ~ -^- ^^^ ^ given sequence At, 
define the Fisher information /(^o) as in Eqn (fT3l) and rewrite it as I(9n\ At) to emphasize its 
dependence on At. Assuming T > {m — l)au/6Q, then we have 

T . 



min I{6q]M) = {m - l)g{- 



V' 



where Am = {At^ : J2^2^^i = T,Ati > au/9o,i = 2, ■ ■ ■ ,m}, and with the minimum 
achieved at At,- = —^, i = 2, ■ ■ ■ , m. 

' m— 1' ' ' 

Proof: We prove this by induction on m. 
Induction basis: For m = 3, 

/(^o;At) = (?(At2) + (7(At3). 

Using Lemma 1 in the special case of n = 1 the result follows. 

Induction step: Suppose the result holds for 3,4, .. .m, we want to show it also holds for 
m + 1 for T > mau/9o. Note that in this case At G Am+i implies that au/9o < Atm+i < 
T — [ni — l)au/9Q, which will be denoted as At^+i G Am+i below for convenience. We thus 
have 

min {/(0o;At)} 

{m 
y^ g{Ati) +g{Atm+i, 
1=2 



m+1) 



mm < mm < > 9 (At,) > + ofAt 

= min <^ (m - 1)^( ^) + ^(At,„+i; 

T 

= mg{—) , 
m 

where the third equality is due to the induction hypothesis and the first term on the RHS is 

obtained at Atj = "^1",'"'"^ ' i = 2, . . . ,m. The last equality invokes Lemma [21 in the special case 
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of n = m — 1, and is obtained at At^+i = — • Combining these we conclude that the minimum 
value of Fisher information is mg(—), when At, = —,i = 2, . . . ,m + 1. Thus the case m + 1 
also holds, completing the proof. ■ 

Theorem 1 states that given the total sensing period T and the total number of samples m, 
provided that the sampling is done sparsely (with sufficiently large sampling intervals as defined 
in Condition [T]), the Fisher information attains its minimum when all sampling intervals have 
the same value, i.e when using a uniform sensing schedule. In this sense uniform sensing is the 
worst possible sensing scheme; any deviation from it, while keeping the same average sampling 
interval T/(m — 1), can only increase the Fisher information. As we have seen in Figure [21 this 
increase in Fisher information becomes more significant when sampling gets sparser, i.e., when 
m decreases. 

D. A tight upper bound on the Fisher information 

The derivation of the upper bound follows very similar steps as those for the lower bound. 

Lemma 3: For any T E ]S.,T > 2au/9o, and au/9o < At < T — au/Oo, the function 
F{At) = g{T — At)+g(At) has a maximum of (7(a;n/6'o)+5'(T — a;M/^o) attained at At = au/9o 
or At = T- au/Oo. 

Proof: Firstly we prove that F is convex under the stated conditions. We have 

F'iAt)=g'iAt)-g'iT-At) . 

Since g is strictly convex under the stated conditions, by Lemma [U (7 is monotonic increasing. 
Thus F' is also monotonic increasing, hence F is convex. It follows that the maximum of F{At) 
is attained at one and/or the other extreme point of At. In either case we have 

F{au/eo) = F{T - au/Oo) = giau/Oo) + g{T - au/Oo). 

U 

Theorem 2: Consider a period of time [0,T], in which we wish to schedule m > 3 sampling 

points, including one at time and one at time T. Denote the sequence of time spacings between 

these samples as At = [At2, Ats, • ■ ■ , At^], where Y^^2 ^'^i = ^- Assuming T > {m—l)au/6o, 

then we have 



max 

Atg^„ 



/(^o; At) = (m - 2)g{au/9o) + g{T - (m - 2)au/do), 
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where Am = {Atj : X]i^2 ^'^« ^ T,Ati > au/9Q,i = 2, ■ ■ ■ ,m}, and with the maximum 
achieved at At^ = au/Oo, i = 2,- ■ ■ , m — 1 and At^ =T — {m — 2)au/9o. 
Proof: We prove this by induction on m. 

Induction basis: For m = 3, I(9o; At) = 5'(At2) + ^'(Ats). Using Lemma [3] the result 
immediately follows. 

Induction step: Suppose the result holds for 3, 4, ... m, we want to show it also holds for 
m + 1 for T > mau/Oo- Again in this case At G Am+i implies that au/9o < At^+i < 
T — [m — l)au/6Q, which will be denoted as At^+i G Am+i for convenience. We thus have 

max {/(^o;At)} 

Aie.4m+i 



max <^ V" 5f(Atj) + g{/\tra+i] 



= max < max < > gf(Atj) > + g(Atm+i'! 

= max <^ (m-2)5f(au/6'o) +5'(7'- At„+i - (m-2)aM/6'o) +5'(At„+i) 

= {m-l)g{au/dQ)+ g{T - {m-Vjau/eo) , 

where the third equality is due to the induction hypothesis and the first term on the RHS is 
obtained at Atj = au/Oo, i = 2, . . . , m — 1 and At^ = T — Atm+i — (m — 2)au/9o. The last 
equality invokes Lemma [3l and is obtained at At^+i =T — {m— l)au/OQ or Atm+i = au/OQ. 
Thus the case m + 1 also holds, completing the proof. ■ 

We see from this theorem that under the sparsity condition, the best sensing sequence is to 
sample at the smallest interval that the condition would allow, till we use all the m — 2 samples 
we have the freedom of placing. This produces a uniform sequence of sampling times except 
for the last one. It can be shown that if we remove the constraint of having a window of T, 
but rather seek to optimally place m points subject to the sparsity condition, then the optimal 
sequence would be exactly uniform with the interval Atj = au/6Q. However, since 6*0 is the very 
thing we are trying to estimate, it would be unreasonable to suggest that this optimal interval is 
known a priori. Therefore, this optimal sequence, while exists, is not in general implementable. 
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E. Best and worst sampling schemes without the sparsity condition 

The preceding upper- and lower-bound achieving sensing sequences were derived under the 
sparsity Condition 1. Below we show how to obtain the best and worst sensing sequences in a 
more general setting, without the requirement of Condition 1, via the use of dynamic program- 
ming. While this result is more general compared to those derived under the sparsity condition, 
structurally they are not as easy to identify and are thus given in a numerical form. These 
sequences are also not practically implementable as they also assume the a priori knowledge of 
the parameters to be estimated. 

Denote by tt a sampling policy given by the time sequence {ti, ti, ■ ■ ■ , tm}- Then the optimal 
sampling policy is given by 

TT* = argmax/f^o) ; (16) 

Tren 

where the set of admissible policies 11 = {tj : ti = 0, tm = 7", < t2 < ■ ■ ■ < tm-i < T}. 

The maximum I{6q) can be recursively solved through the set of dynamic programming 
equations given below: 

V{l,t) = g{T-t), \fO<t<T; 

V{k,t) = max[g{x-t) + V{k-l,x)], \f <t < T, k = 2,3,- ■ ■ ,m - 1 , (17) 

t<x<T 

and 

maxI{eo) = max[g{t) + V{m-l,t)] . (18) 

Here the value function V{k,t) denotes the maximum achievable Fisher information given we 
last sampled at time t, with k points remaining to be placed between (t,T]. 

Note that since t is continuous, the pair {k,t) has an uncountable state space. In computing 
the DP equation (flTI) we discretize t and T into small steps and require that both be integer 
multiples of this small quantity. The resulting DP has a finite state space and can be solved 
backwards in time in a standard manner. 

It is straightforward to see the exact same procedure can be used to find the sampling sequence 
that minimizes the Fisher information, thus giving the worst sampling sequence. It turns out that 
the worst sampling sequence in this case coincides with the worst sequence derived under the 
sparsity condition, i.e., it is also the uniform sequence. 
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F. A comparison 

We now compare the different sensing sequences we obtained in this section using an example. 
They are illustrated in Figure [3(a)j In this example the channel parameters are E\Tq] = 5 and 
E[Ti] = 3 time units, respectively. The time window is set to be 40 time units, and the channel can 
only be sensed 5 times. Shown in the figure are the uniform sensing sequence, the best/worst 
sensing sequences derived under the sparsity condition, and the best/worst sequences derived 
using dynamic programming. As mentioned earlier, the worst obtained via dynamic programming 
coincides with the uniform sampling sequence. The worst under the sparsity condition also 
coincides with the uniform sequence, a fact proven in Theorem [H as the sparsity condition 
holds in this case. In Figure [3(b)| we compared the performance of these sampling strategies, by 
setting the time window to 5000 time units. The estimated value under each strategy is shown as 
a function of the number of samples taken. The true value is also shown for comparison. These 
are used as benchmarks in the next section in evaluating random sensing schemes. 



uniform /worst by DP/worsl X 
under sparsity condition ^ 



best by DP 



(XXX 



best under sparsity I 
conditon '^ 



X Sampling poinl 



^ 

012345678910 20 30 40 tilllf 



E[TJ=5 E[T,]=3 



Actuai vaiue 

- Best by DP 

- Uniform/worst by DP/worst under sparsity conditibn 

- Best under sparsity cbndition 




40 50 60 70 

Sampie number (T=5000) 



(a) Illustration of different sampling sequences (b) Performance comparison 

Fig. 3. Comparison of different sampling sequence 



As we can see from Figure [3(a)| the best sensing sequence produced by dynamic programming 
without the sparsity condition also appears to be uniform except for the last sample, as is the 
case with the best sequence under the sparsity conditior|j. The difference is that the former uses 



^Note however that this conclusion is drawn empirically from a large amount of numerical experiment in the case of not 
requiring sparsity. By contrast, under the sparsity condition the conclusion is drawn analytically in Theorem |2] 



18 

a smaller interval value that violates the sparsity condition. As mentioned earlier, if we were 
to remove the requirement that one sample be placed at time T, then the optimal sequence of 
m would appear to be uniform (again, this conclusion is drawn empirically in the case of no 
sparsity requirement, and precisely and analytically in the case of sparsity), with the optimal 
interval being the value that maximizes (fTSl) . Interestingly, the worst sequence is also uniform 
with or without the sparsity condition. 

What this result suggests is that in the ideal case if we have a priori knowledge of the channel 
parameters, to maximize the Fisher information the best thing to do is indeed to sense uniformly. 
The difficulty of course is that without this knowledge we have no way of deciding what the 
optimal interval should be, and uniform sensing would be a bad decision as it could turn out to 
be the worst with an unfortunate choice of the sampling interval. 

In such cases, the robust thing to do is simply to sense randomly, so that with some probability 
we will have sampling intervals close to the actual optimum. This is investigated in the next 
section. 

V. Random sensing 

Under a random sensing scheme, the sampling intervals At/ are generated according to some 
distribution /(At) (this may be done independently or jointly). Below we first analyze how the 
resulting Fisher information is affected, and then use a family of distributions generated by the 
circular (3 ensemble to examine the performance of different distributions. 

A. Effect on the Fisher information 

We begin by examining the expectation of the Fisher function, averaged over randomly 
generated sampling intervals, calculated as follows: 

E[g{At)] = / g{M)f{At)dAt (19) 

= / [^(/^o)+^Vo)(At-/io) 

J... + «""'^°"^'-^°'\...l/(Ai)dAt 
n\ 

= 5'(Ato) +5''(/Wo)/iiH \ j^-^H 
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where the Taylor expansion is around the expected sampling interval /Iq = E[At], or T/(m — 1) 
for given window T and m number of samples taken, and fin = J^{^^ ~ ^o)"'f{^t)dAt is the 
nth order central moment of At. 

In order to have a fair comparison we will assume T and m are fixed, thus fixing the average 
sampling interval /lo under different sampling schemes. Also note that the value g^'^\^o) is 
completely determined by the channel statistics and not the sampling sequence. Consequently 
the expected value of the Fisher function is affected by the selection of a sampling scheme only 
through the higher order central moments of the distribution /(). Note that the expectation of 
the Fisher function under uniform sampling with constant sampling interval Ho is simply g{iJio) 
(i.e., only the first term on the right hand side remains). Therefore any random scheme would 
improve upon this if it results in a positive sum over the higher order terms. While the above 
equation does not immediately lead to an optimal selection of a random scheme, it is possible to 
seek one from a family of distribution functions through optimization over common parameters. 

Before we proceed with this in the next subsection, we compare the normal, uniform and 
exponential random sampling schemes using the above analysis. In Table |ll we list the higher 
order central moments of normal, uniform and exponential distributions Q It can be easily 
concluded that among these three choices the Fisher function has the largest expectation under 
the exponential distribution. 



TABLE I 

Higher central moments 





Normal 


Uniform 


Exponential 


n is even 


nicr" 


Mo 
n + 1 


,„V-" {-ifn\ 


(f)!22 


n is odd 








H-o 2^k=0 fc! 



We further compare their performance in Fig. |4] as we increase the number of samples m 
over a window of T = 5000 time units. Our simulation is done in Matlab and uses a discrete 
time model; all time quantities are in the same time units. The maximum number of samples is 
5000; this is because the on/off periods are integers, so there is no reason to sample faster than 



For normal distribution the probability distribution function is cut off at zero and then renormalized. 
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once per unit of time. The sampling intervals under the uniform sensing are Y^ /{m — 1)J . The 
sampling times under random schemes are generated as follows. We fix the window T and take 
m to be the average number of sampleqj. We place the first and the last sampling times at time 
and T, respectively. We then sequentially generate At2, At2, ■ ■ ■ according to the given pdf 
/() with parameters normalized such that it has a mean (sampling interval) of T /{m — 1). For 
each Atj we generate we place a sampling point at time Yl\=2 ^^k- This process stops when this 
quantity exceeds T. Note that under this procedure the last sampling interval will not be exactly 
according to /() since we have placed a sampling point at time T. However, this approximation 
seems unavoidable. Alternatively we can allow T to be different from one trial to another while 
maintaining the same average. As long as T is sufficiently large this procedure does not affect 
the accuracy or the fairness of the comparison. For each value of m, the result shown on the 
figure is the average of 100 randomly generated sensing schedules. We see that exponential 
random sampling outperforms the other two; this is consistent with our earlier analysis on the 
Fisher information. 



E[TJ=2 E[T,1=1 



Actual value 

- Normal 

- Uniform 

- Exponential 

- Uniform/worst by DP/worst under sparsity condition 

- Best under sparsity condition 

- Best by DP 




60 80 100 120 140 160 180 

Average sample number (T=5000} 



Fig. 4. Performance comparison of random sensing : Normal vs. 
Uniform vs. Exponential 



The reason m is only an average and not an exact requirement is because we cannot guarantee to have exactly m samples 
within a window of T if we generate sampling intervals randomly according to a given pdf. By allowing m to be an average 
we can simply require the pdf to have a mean of T /(m — 1). 
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B. Circular (5 ensemble 

We now use the circular (3 ensemble E4l to study a family of distributions. The advantage 
of using this ensemble is that with a single tunable parameter we can approximate a wide range 
of different distributions while keeping the same average sampling rate. 

The circular (3 ensemble may be viewed as given by n eigenvalues, denoted as \j = e'^^, 
j = 1, ■ ■ ■ ,n. These eigenvalues have a joint probability density function proportional to the 
following: 

Y[ |e*'''= - e'^' f, -TT < Oj < 7r, j,k,l = 1,- ■ ■ ,n, (20) 

l<k<l<n 

where /3 > is a model parameter. In the special cases /3 = 1, 2 and 4, this ensemble describes 
the joint probability density of the eigenvalues of random orthogonal, unitary and sympletic 
matrices, respectively [|24l . 

We use the set of eigenvalues generated from the above joint pdf to determine the placement 
of sample points in the interval [0, T] in the following manner. In [|25l a procedure is introduced 
to generate a set of values 9j, j = 1,2,- ■■ ,n that follow the joint pdf given by (|20|) . Setting 
n = m, these n eigenvalues are then placed along a unit circle (each at the position given by 
9j), which are subsequently mapped onto the line segment [0, 1]. Scaling this segment to [0,T] 
gives us the m sampling times. The intervals between these points now follow a certain joint 
distribution. As (3 varies we can obtain a family of distributions indexed by (3. Below we will 
refer to this method of generating sample points/intervals as using the circular (3 ensemble. Note 
that by this procedure we cannot guarantee to have a sample taken at times and T, respectively. 
However, since the window size T and the number of samples m are used, we maintained the 
same average sampling rate. 

In Fig. |5] we give the pdfs of intervals generated by the circular ensemble with different (3. 
For each value of (3, We use the generating method in [l25l to obtain 200 random variables in 
[0, 1], then scale them to be in [0, 5000]. The successive intervals between neighboring points 
are collected with the their pdf shown in the figure. We can see that as /3 approaches 0+ the 
pdf becomes exponential-like and as (3 approaches +oo, the pdf becomes deterministic; these 
are well known facts about circular ensembles. 
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Fig. 5. Probability distribution function of intervals generated by the 
circular /3 ensemble 



C. A comparison between different random sensing schemes 

In Fig. [6] we show the Fisher information with sampling intervals generated by the circular 
13 ensemble. The corresponding estimation performance comparison is given in Fig. U\ The 
performance of the best and worst sequences with and without the sparsity condition are also 
shown for comparison. Note that when (3 = 10^, the sampling sequence coincide with the worst 
obtained via dynamic programming, the worst under sparsity condition and uniform sensing, 
therefore their performances are the same. 
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Fig. 6. Comparison of Fisher information with intervals generated 
by the circular /3 ensemble for an exponential channel 
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Fig. 7. Performance comparison of random sensing with intervals 
generated by the circular /? ensemble for an exponential channel 



We see again that exponentially generated sampling intervals performs the best. This may be 
due to the fact that the on/off durations are also exponentially distributed, thereby creating a 
good "match" between the fisher function gQ and the pdf /() that results in a larger value of 
the expected Fisher function value (see Eqn. (fTOl)). 



D. Discussions on other channel models 

So far all our analysis and results are based on the exponential channel model. The problem 
quickly becomes intractable if we move away from this model, though the basic insight should 
hold. We now examine a channel model with on/off durations following the gamma distribution. 
The pdf of the on/off durations are expressed as 






(21) 



Ao°r(fco) 
They are each parameterized by a shape parameter k and a scale parameter A, both of which are 

positive. In this case, the Laplace transforms of /o(t) and /i(t) are (1 + Aqs)^'^" and (1 + Ais)^'^\ 

respectively, and the expectation of the on/off periods are -^[T'l] = kiXi and E[To] = /cqAo- In the 

following simulation both ki and ko are set to 2, with a simulated time of 5000 time units. The 

channel parameters are set to be E[Ti] = 10 and [Tq] = 20 time units. The sampling intervals are 

randomly generated by the circular /3 ensemble. We see that random sensing again outperforms 

uniform sensing using such a channel model. 
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Gamma distribution: E[TJ=20 E[T^]=10 l(,=l<„=2 
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Fig. 8. Performance comparison of random sampling with intervals 
generated by the circular /? ensemble for gamma channel model 



It should be noted that since the gamma distribution is the conjugate prior for the expo- 
nential distribution, the latter being a special case of the former, this result is not surprising. 
Unfortunately, obtaining similar result for other channel distributions becomes computationally 
prohibitive. The complexity is due to two reasons. Firstly, for most distributions the Laplace 
transform is complex, resulting in the complexity in obtaining the corresponding time domain 
expressions. Secondly, with the exception of the exponential distribution, without the memory less 
property the likelihood function also becomes intractable. 



VI. Adaptive Random Sensing for Parameter Tracking 

Using insights we have obtained on uniform sensing and random sensing, we now present a 
method of estimating and tracking a time-varying parameter. This is a moving window based 
estimation scheme, where the overall sensing duration T is divided into windows of lengths 
T^. In each window samples are taken and an estimate produced at the end of that window. 
This estimate is then used to determine the optimal number of samples to be taken in the next 
window. This method will be referred to as the adaptive random sensing scheme. The adaptive 
nature of the scheme comes from adjusting the number of samples taken in each window based 
on past estimates. 

Specifically, at the end of the i-th window of T^, we obtain the ML estimate 6}^' and m*^*) 
based on samples collected during that window. Now assuming that we will use uniform sensing 
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in the (i + l)th window with a sampling interval Atp, and assuming that ^q and m*^*^ are the true 
parameter values in the {i + l)th window, we can obtain the expectation of the next estimate, 
denoted as S'q , as a function of (T^, Atp, u^'^\ 6q ). The optimal sampling interval Atp ' for 
the (i + l)th window is then calculated as follows: 



At^~^^^ = argmin 



|2(*+1) n{i)\ 



(22) 



where e is an error factor introduced to lower bound the minimizing interval Atp ' . Without 
this factor the interval will end up being very small, i.e., requiring a large number of samples for 
the next window. The intuition behind the above formula is that assuming the channel parameters 
are relatively slow varying in time, the estimate from the previous window 0q may be viewed 
as true. So for the next window we would like to find the sampling interval that allows us to 
get as close as possible to this value subject to an error. 

Note that the above calculation relies on the availability of 6q , a quantity obtained assuming 
uniform sampling will be used in the next window. In the actual execution of the algorithm, we 
simply use this to obtain Atp as shown above. This gives us the desired number of samples 
to be taken in the next window: M(*+^) = [T^/Atp '] . Following this, random sensing is used 
to generate M*^*+^) random sampling times within the next window. An estimate is then made 
and this process repeats. 

It remains to show how 6q ' is obtained. As mentioned earlier, when the on/off periods are 
exponentially distributed there is a simple closed-form solution to the ML estimator. This was 
calculated in Q and we will use that result directly below. Specifically, with M = \Tw/Atp] 
samples uniformly taken, the estimate of channel utilization u is given hy u = j^ Z]i=i ^i- The 
estimate of 6^0 is given by 



where 

A= {u-u^){M -1 



. « -BW^HM], (23) 

" At„ ^ 2A ^' ^ ^ 



B = -2A + (M - 1) - (1 - u)no - un^ ■ (24) 

C = A — uriQ — (1 — u)n3 

Here nQ/ni/n2/n^ denotes the number of (0 — )• 0)/(0 — )• 1)/(1 — )■ 0)/(l — )• 1) transitions out 
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of the total (M — 1) transitions. Their respective expectations are given by 

E[no] = M(l - n)Poo(Atp; ^o), E[n2] = MuPio{Atp- 9o), 
E[n,] = M(l - n)Poi(Atp; ^o), E[n,] = MuPniAtp, 6^). 



Taking these quantities into (1241) and (I23I) . we obtain the expectation of Oq, Oq, which is a function 
of (Tu},Atp,u,6o). Replacing u with u^'^\ 6q with ^q , and Oq with 6'o ' we obtain the desired 
result. 

Figure |9] shows the tracking performance of the adaptive random sensing algorithm, where 
within each moving window the sampling times are randomly place following a uniform dis- 
tribution. In the simulation the size of the time window is set to be 3500 time units and the 



error factor e is set at 1. In Figure [9(a)] the channel parameter E\Tq\ varies as a step function: 
starting from 6 time units, it is increased by 5 every 30000 time units, while -^[^1] is set to 
i?[To]/2. In Figure [9(b)] the channel parameter changes more smoothly as shown. The dashed line 
represents the actual channel parameter. For comparison purpose we also include the results from 
an adaptive uniform sensing algorithm. These are obtained by following the exact same adaptive 
procedure outlined above, with the only difference that in the z-th window uniform sensing is 
used, instead of random sensing, with a constant sampling interval of Atp . We see that the 
estimation under adaptive random sensing (RS) can closely track the time-varying channel, and 
clearly outperforms adaptive uniform sensing (US) at short on/off periods. 

The number of samples taken in each window (or estimation cycle) following this adaptive 
scheme is given in Figure [TO] It shows as the on/off periods increase, the sampling rate is 
automatically decreased as an outcome of the tracking. 

VII. A Discussion on the Applicability of Compressive Sensing 



Recent advances in compressive sensing theory [|26l . 11271 . 11281 allow one to represent com- 
pressible/sparse signals with significantly fewer samples than required by the Nyquist sampling 
theorem. It is therefore particularly attractive in a resource constrained setting. This technique has 
been used in data compression []29]| , channel coding BOil . analog signal sensing OTl . routing [.32il 
and data collection B3I . It is tempting to examine whether this technique brings any advantage 
for our channel estimation problem. The idea is to randomly sample the channel state, use 
compressive sensing techniques to reconstruct the entire sequence of channel state evolution. 
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Fig. 9. Estimation performance of time-varying channel 
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Fig. 10. Corresponding number of samples in each estimation window 



and then use the ML estimator to determine the channel parameter. Compared to the sensing 
schemes discussed in the previous sections, this is an indirect use of the ML estimator, in that the 
entire sequence will be reconstructed before the estimation. In this sense the use of compressive 
sensing also seems to be an overkill for the purpose of parameter estimation. 

Consider a vector of discrete-time, finite, one-dimensional signal xatxi, which can be expressed 
as X = ^a, where ^ is an A^ x A^ basis matrix and a is a vector of weighting coefficients. The 
signal vector x is fi'-sparse if aATxi has only K non-zero elements. The compressive sensing 
theory states that the signal x can be reconstructed successfully by M measurements y, which 
is done by projecting the signal x to another basis $ that is incoherent with ^, i.e., y = $x = 
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#^a. The required length of y, M, depends on the sparsity of the signal and the reconstruction 
algorithm. The reconstruction is typically done by solving the li-norm optimization problem: 
a = argmin ||a||i, s.t. y = $^a. Algorithmically this can be solved by linear programming or 
iterative greedy algorithm such as orthogonal matching pursuit (OMP) [34]. 

For our channel estimation problem, consider the signal x = {xi,X2,--- ,xn} to be the 
discrete time 0-1 sequence of channel states, with xi denoting the channel state at time i. The 
physical nature of channel sensing implies the measurement matrix <^mxn consists of rows each 
containing only a single 1 in the position where the channel was sensed and everywhere else. 
Specifically, a 1 in the position {i,j) means that the ith measurement was taken at time j. In 
addition, there can only be one measurement taken at time j, i.e., no two rows can have a 1 
in the same column. As M < A^ in general (or it wouldn't be compressive sensing), there will 
be exactly A^ — M empty (all-0) columns, making the matrix extremely sparse. This poses a 
significant challenge since in general the $ matrix is required to be dense (though randomly 
generated), with at least one non-zero entry in each column. 

For the reconstruction to be successful, two conditions need to be satisfied: the signal needs 
be sparse in some domain (i.e., the existence of a ^ such that a is sufficiently sparse), and 
the two matrices # and ^ need to be incoherent. Due to the binary property of the channel 
state sequence, it's difficult to find a basis matrix ^ that has dense entities. As a result we have 
two very sparse matrices and they are highly coherent. For these reasons we have not found 
compressive sensing to have an advantage in our channel estimation problem. 

Figure [TT] shows some comparison results. In the simulation of compressive sensing based 
estimation, we reconstruct the original state sequence using Harr wavelet basis. All other con- 
ditions remain the same as in previous sections. The time window is set to 4096 time units. 
Overall compressive sensing based estimation dose not compare favorably with uniform sensing 
and random sensing, due to the coherence problem between the two matrices. It remains an 
interesting problem to find a good basis matrix that can both sparsify x and at the same time be 
sufficiently incoherent with the measurement matrix. A similar difficulty was noted in [[32ll in 
trying to use compressive sensing for a data gathering problem. A number of commonly used 
transformations were considered, and it was found that, with real data sets, none of them was 
able to sparsify the data while being at the same time incoherent with the routing matrix. 



29 



E[T 1^20 EU H10 



E[TH100E|T]^5 



-■8---r--> i 




9Q0 1 QOO 




90Q 1 OQQ 



(a) E[To] = 2, E[Tj] = 1 (b) E[To] = 20, ^[ri] = 10 (c) £;[ro] = 100, E[Ti] = 5 

Fig. 11. Estimation performance comparison: random sensing vs. uniform sensing vs. CS based sensing 

VIII. Conclusion 

In this paper we studied sensing schemes for a channel estimation problem under a sparsity 
condition. Using Fisher information as a performance measure, we derived the best and worst 
sensing sequences both with and without the sparsity condition. These sequences, while not ex- 
actly implementable, provide significant insights as well as useful benchmarks. We then examined 
the performance of random sensing schemes, by comparing a family of distributions generated 
by the circular (3 ensemble. Using these insights, an adaptive random sensing scheme was 
proposed to effectively track time-varying channel parameters. We also discuss the applicability 
of compressive sensing in this context. 



Appendix A 
Proof of Lemma 1 

Proof: For simplicity in presentation, we first write g{At) = ho{/S.t)h{At), where 

At2 



ho{At) 
h{At) 



u^ 



^-doAt/u 



hi{At) + h2{At) + hiAt) , 
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where 

hi{At) 

h2{At) 
hiAt) 



I — g-6»oAt/M' 

(1 -u)+ue-^oAt/u-' 
u + {l -u)e~^oAt/u- 



We proceed to show that each of the above functions is convex under Condition \T\ 
We first show that ho{At) is strictly convex for At > (2 + ^/2)m/6'o. Under this condition and 
noting < n < 1 and ^q > we have 

/^:(At) = ^e-^"^*/"(2-^)<0, 



Therefore for ^^^ > 2 + v^, ho{At) is strictly convex. That hi{At) is strictly convex is 
straightforward. Since < m < 1 and ^o > 0, we have: 

,,,, , -2(1-^)^06-^"^*/'' 

_ 2(1 - M)gge-^''^*/"(l + e-^"^*/") 

Next we show that /i2(At) is strictly convex for At > f- In(Y^). This condition is equivalent 
to tie"^"^*/" < I — u. Under this condition and again noting < m < 1 and ^o > 0, we have 



[(1 - m) + Me-«oAt/«]2 < U' 
„ _ (1 - n)2^2g-eoAt/a[(i _u)- ue-^"^*/"] 



U L \ / 

[(1 -u) + ue-^oAt/«]3 
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Similarly, /i3(At) is strictly convex under the condition At > j-ln(i^), since 

(1 - M)2^2g-eoAt/«[^ _ (1 _ M)e-eoAt/«] 



Therefore under the condition At > au/Oo, hi, /i2 and /i3 are all monotonically decreasing 
convex functions. It follows that h = hi + h2 + h^ is also monotonically decreasing and convex. 
Furthermore, for any At > 0, ho{At) > 0, and h(At) > h{+oo) = 0. We can now show that g 
is strictly convex under this condition: 

g"{At) = {h,{At)h{At))" 

(26) 
= hl{At)h{At) + 2h'^{At)h' (At) + hoiAt)h"{At) > , 

where the inequality holds because every term on the right hand side is positive under the 

condition At > au/9o as summarized above. ■ 
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